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Abstract 

We propose a statistical approach to tornadoes modeling for predicting and 
simulating occurrences of tornadoes and accumulated cost distributions over 
a time interval. This is achieved by modeling the tornadoes intensity, mea¬ 
sured with the Fujita scale, as a stochastic process. Since the Fujita scale 
divides tornadoes intensity into six states, it is possible to model the torna¬ 
does intensity by using Markov and semi-Markov models. We demonstrate 
that the semi-Markov approach is able to reproduce the duration effect that 
is detected in tornadoes occurrence. The superiority of the semi-Markov 
model as compared to the Markov chain model is also affirmed by means 
of a statistical test of hypothesis. As an application we compute the ex- 
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pected value and the variance of the costs generated by the tornadoes over a 
given time interval in a given area. The paper contributes to the literature 
by demonstrating that semi-Markov models represent an effective tool for 
physical analysis of tornadoes as well as for the estimation of the economic 
damages to human things. 

Keywords: Tornadoes modeling, Markov and Semi-Markov process, 

Reward process 


1. Introduction 

Every year tornadoes cause deaths and several damages to people and 
things. Only in the USA, tornadoes killed in average more than 100 people 
per year from 2004 to 2013. Just to give and example of the monetary 
damages of tornadoes in the USA, in 2013 they cost about 200 millions of 
dollars |in]. In this scenario, the development of techniques to estimate 
and model the probabilities of these events is needed and can be of great 
beneht for the society. Many researchers are working on this subject, see e.g. 
[miniin]. The approaches used can be typically divided into two main groups, 
one analytical and another statistical (e.g. see pQ and [5, respectively). 

Here we propose a statistical approach based on semi-Markov model. This 
kind of models generalize the more common Markov chain models and their 
main feature is the possibility to reproduce the duration effect of the con¬ 
sidered random phenomenon. This is made possible by considering sojourn 
times in the states of the process, that are distributed according to any type 
of probability distribution functions, non-memoryless distributions included. 
In this work we choose to model the tornado’s intensity as a stochastic pro¬ 
cess. The tornado’s intensity is measured by the Fujita scale which is an 
empirical scale related to the gravity of the damages produced by the tor¬ 
nado. Since the Fujita scale divides tornadoes intensity into six states, it 
is possible to model the tornadoes intensity by using semi-Markov models. 
The database used in this work is made available from the National Oceanic 
and Atmospheric Administration (USA) that counts of more than 60 000 
tornadoes from 1950 until 2013. The proposal of a semi-Markov model for 
modeling tornadoes allows the estimation of probability of an occurrence of 
a tornadoes with a certain intensity at each time in a given location. This 
also gives the possibility to compute the total costs of damages caused by 
the tornadoes which is a relevant indicators of environmental hazards. The 
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Figure 1: Geographical distribution of the database’s events, extrapolated 
from http://www.spc.noaa.gov/gis/svrgis/images/tornado.png 


paper is organized as follow. In the next Section we introduce the database 
and the object of investigation. In Section 3 we present the semi-Markov 
model and the related reward (cost) process. Section 4 shows the main ap¬ 
plication of the model to the tornado process. At last, in Section 5 we give 
some concluding remarks. 

2. Database 

The data that we use in this work come from the National Oceanic 
and Atmospheric Administration’s (NOAA) National Weather Service and 
it are freely available on the website www.spc.noaa.gov/wcm/^^/^data. In the 
database are collected almost 60 000 events from 1950 to 2013, all of them 
geographically distributed in the USA (as it is possible to see in Figure [^. 

For each event date, time, state, F-scale, injuries, fatalities, starting lati¬ 
tude and longitude, ending latitude and longitude are recorded. The physical 
quantity of our interest is the F-scale, (the Fujita scale). This is an empirical 
scale that measure tornado intensity based on the damage produced to man¬ 
made structures. It can be also almost joined to the wind speed, e.g. for a 
tornado classihed FO the wind speed can goes from 64 to 116 m/s, instead 
for a F5 tornado from 419 to 512 m/s [6]. As it is well known, the Fujita 
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scale admits six values of tornado intensity that go from FO to F5. As it is a 
discrete scale, the tornado intensities, measured by the Fujita scale, can be 
naturally modeled through semi-Markov models. 

3. Semi-Markov Process 

We dehne an homogeneous semi-Markov process with values in a hnite 
state space E = {1,2, see for example |Hl[Z|- Let (f2,F,P) be a prob¬ 

ability space; we consider two sequences of random variables J = (JnjneiN 
and T = where 


Jn ^ ^ E ] IN 

They denote the state and the time of the n-th transition of the system, 
respectively. In onr application is the intensity of the n-th tornadoes and 
Tn the time of its occurrence. 

We assnme that {J,T) is a Markov Renewal Process on the state space 
P X IN with kernel Qij{t), i,j G E,t G IN. The kernel has the following 
probabilistic interpretation: 

P[Jn+l = j, Tn+1 -Tn< t\a{Jh, Th), h< n, Jn = i] = 

P[Jn+l = j, Tn+1 - Tn < t\Jn = i] = Qijit), 

where {a{Jh,Th), h < n) represents the set of past values of the Markov 
renewal process {J,T). Relation ([^ asserts that the knowledge of the last 
tornado’s intensity snffices to give the conditional distribntion of the conple 
— Tn) whatever the past valnes of the variables might be. 

It is simple to realize that ptj := P[J„+i = j \ Jn = i] = hm Qij{t); i,j G 

t^OO 

P, t G IN where P = {pij) is the transition probability matrix of the embed¬ 
ded Markov chain Jn- 

Simple probabilistic reasoning allows the compntation of the conditional 
probability distribntion of the sojonrn time T„+i — in the state given 
that next visited state is Jn+i- In formula: 


Gijit) := P{Tn+l -Tn< t\Jn = i, Jn+1 = j} = 

1 1 if Pij = 0 


( 2 ) 
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The Gij{-) denotes the waiting time distribution function in state i given 
that, with next transition, the process will be in the state j. The sojourn 
time distribution Gij{-) can be any distribution function. We recover the 
discrete time Markov chain when the Gp(-) are all geometrically distributed. 
Therefore we should hnd out whether the inter-arrival times between two 
tornadoes of given intensities follows a geometric distribution or not. This is 
a primary question to which we will respond in next section. 

Now it is possible to dehne the time homogeneous semi-Markov chain 
Z{t) as 

Z{t) = Jnh), Vt G in (3) 

where N{t) = sup{n G IN : < t}. Then Z{t) represents the state of the 

system for each waiting time. 

At this point we introduce the discrete backward recurrence time process 
linked to the semi-Markov chain. For each time t G IN we dehne the following 
stochastic process: 

i?(t) = t — Tjvp). (4) 

We call it discrete backward recurrence time process. It denotes the time 
elapsed from the occurrence of the last tornado to the current time t. 

The joint stochastic process {Z{t), B{t),t G IN) with values in x IT is 
a Markov process. That is: 


P[Z{T) =j, B{T) =v'\a{Z{h), B{h)),h<t, Z{t) = i, B{t) =v\ 
= P[Z{T)=j,B{T) = v'\Z{t) = i,Bit) = v] =: 

with the following evolution equation, see e.g. [3]: 


j{V^V Oij T ^ / NT = 


EE 

k&E s=l 


[1 

T Qik(y^ T ^ f) 
[1 ~ 'Tha&E 


{v'=t+v} 


(5) 


VL(0;n',t-s). 


'kj 


Expression (|^ provides the probability of having a tornado of intensity 
j after t — v' periods and no additional tornado within the times {t — v' + 
l,t — v' + 2,... p} given that the last tornado occurred v periods before the 
present time and was of intensity i. 

We can now dehne the accumulated discounted reward (cost), Ot), during 
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the time interval (0,t], by the following relation, 

N{t) 

n=l 

where 'i/'Jn is the cost caused by the n-th tornado that had an intensity J„. 
This cost has to be discounted using a deterministic force of interest 6 and the 
time Tn of occurrence of the event. The total damage over the time interval 
[0,f] is obtained by summation over the random number of tornadoes N{t) 
up to time t. 

In the application section we will compute the expected value 
and the second order moment E[^‘^{t)]. For an extended treatment of the 
semi-Markov reward process see e.g. [T^ . 

4. Application to real data 
4 . 1 . Test 

The hrst step of our application is to test the validity of the Markov chain 
hypothesis and to do that we apply a test of hypothesis proposed by na 
and here shortly described. As already stated, the model can be considered 
Markovian if the sojourn times are geometrically distributed. The probability 
distribution function of the sojourn time in state i before making a transition 
in state j has been denoted by Gjj(-). Dehne the corresponding probability 
mass function by 


= P{Tn+l -Tn = t\Jn = f, Jn+1 = j} = 
f ~ 1) if t > 1 /yN 

\ ift = l 

Under the geometrical hypothesis the equality — gij{l)) — fi'p(2) = 0 

must hold, then a sufficiently strong deviation from this equality has to be 
interpreted as an evidence against the Markovian hypothesis and in favor of 
the semi-Markov model. The test-statistic is the following: 

^ j)(gij(l)(l - gij(l)) - gij(2)) 

yg.,(l)(l-g.,(l))=(2-gy(l)) 

where N{i,j) denotes the number of transitions from state i to state j ob¬ 
served in the sample and gij{x) is the empirical estimator of the probability 
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state 

state 

score 

decision 

i = 1 

J=2 

9.79 

iJo rejected 

i = 1 

J=3 

4.43 

Hq rejected 

i = 3 

J = 1 

4.24 

Hq rejected 

i = A 

J = 1 

5.50 

Hq rejected 


Table 1: Results of the Test 


gij{x) which is given by the ratio between the number of transition from i 
to j occurring exactly after x unit of time and This statistic, under 

the geometrical hypothesis Hq (or Markovian hypothesis), has approximately 
the standard normal distribution, see |12j . 

We applied this procedure to our data to execute tests at a signihcance 
level of 95%. Because we have 6 states we estimated the 6 x (6 — 1) waiting 
time distribution functions and for each of them we computed the value of 
the test-statistic (|^. The geometric hypothesis is rejected for 17 of the 30 
distributions. In Table we show the results of the test applied to the wait¬ 
ing time distribution functions for few states. 

The large values of the test statistic suggest the rejection of the Marko¬ 
vian hypothesis in favor of the more general semi-Markov one. 


4-2. Probability Transition Matrices 

To set the Markov model and the semi-Markov one, described in previous 
section, we use the Matlab Application Semi-Markov Toolbox [2]. This ap¬ 
plication allows to create Markov and semi-Markov models starting from real 
discrete data of a given phenomenon. The outputs consist of synthetic time 
series, of the same length as the real one, generated by means of Monte Carlo 
simulation and the probability transition matrices. These are practically the 
core of the models and allow to use them for different purposes, such as time 
series generation, forecasting and simulation of the phenomenon of interest. 
The Monte Carlo algorithm consists in repeated random sampling to com¬ 
pute successive visited states of the random variables {Jo, Ji,...} up to the 
horizon time L. The difference of the semi-Markov with respect to Markov is 
to consider also the jump times {To,Ti,...}. The algorithm for semi-Markov 
model consists of 4 steps: 
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state i 


State j 


6 6 

Figure 2: Transition probability matrix of the embedded Markov model. 

1) Set n = 0, Jo = ■?, To = 0, horizon time= T; 

2) Sample J from pj^ and set J„+i = J(a;); 

3) Sample W from and set T„+i = + W{uj); 

4) If Tn+i > L stop 

else set n = n + 1 and go to 2). 

Here below we show the results of the application in terms of transition 
probability matrices of the two considered models. Particularly, in Figure 
1^ we show graphically the transition probability matrix of the embedded 
Markov model. 

In Figures and instead, we show the transition probability matrices 
of the semi-Markov model. The different matrices are plotted by varying the 
time t, by fixing v = 1, (Figure]^ and the backward v, by fixing t = 1 (Figure 
1^. As it is possible to note the dependence of the tornado process by the 
backward is more strong with respect to the time. This is evident in Figure 
1^ where for little variations of the backward we have great variation on the 
probability transition matrices. From Figure]^ we can continue to highlight 
the great dependence of the process by the backward from the observation 
of the extreme states. For example if we have an F5 tornado (state 6), 
we can observe that the probability to have, in the next step, a tornado 
with the same intensity increase with the increasing of the backward. A 
similar observation can be made for the virtual transition on the state 1, that 
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corresponds to FO intensity. More generally we can note, at the increasing 
of the backward, a movement of mass on the main diagonal of the transition 
probability matrices. 

4 . 3 . Reward application 

As a further application of the proposed model we apply the reward model 
to the tornado time series. Particularly we transform the original process into 
costs that a state has to pay for the tornado damages. To do this we ap¬ 
ply the results of [10]. The Fujita scale is then transformed into costs, so 
8689$, 62440$, 121141$, 146564$, 177824$ and 89192$ that are respectively 
the mean costs of tornado degrees 0, 1, 2, 3, 4, 5. As previously said, we 
compute the expected value and the variance of the accumulated discounted 
reward, see Figure]^ and Figurerespectively. In both Figures the contin¬ 
uous lines are referred to real data while the dashed lines to the synthetic 
one. In these Figures we show the quantities as a function of the number 
of tornadoes and we highlight the dependences with the actual state i and 
the backward process v by varying them. It is possible to affirm that the 
semi-Markov model well caught the behaviors of the real data especially for 
the hrst number of tornadoes. 

5. Conclusion 

In this paper we model the statistical behaviors of tornadoes in a vast 
region of the USA. To do this we make use of a hrst order semi-Markov 
model that is more general of the Markov chain model. We show, through a 
statistical test that the latter one is not able to capture the duration effect of 
the tornadoes. The more general semi-Markov model in fact, by considering 
the time of permanence in a given state as generated by non memoryless 
distribution, is able to reproduce the duration effect. Moreover, since we 
believe that the costs of the tornado damages are a serious problem related 
to this natural phenomenon, as an economic application we compute the 
expected value and the variance of the accumulated discounted cost and we 
show its dependency by the intensity and the duration of the initial tornado. 
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Figure 3: Transition probability matrix of the semi-Markov model varying 
the time t. 
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Figure 4: Transition probability matrix of the semi-Markov model varying 
the backward v. 
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Figure 5: Expected value of the accumulated discounted reward. Comparison 
between real (continuous line) and synthetic data (dashed line). 
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Figure 6: Variance of the accumulated discounted reward. Comparison be¬ 
tween real (continuous line) and synthetic data (dashed line). 
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